Last updated: 2025-06-20

Checks: 6 1

Knit directory: casper_ss_ma/analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(12345) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.

absolute relative
/Volumes/scratch/DIMA/piva/casper_ss_ma/ ..

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version f0e862c. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .RData
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/figure/

Untracked files:
    Untracked:  .DS_Store
    Untracked:  analysis/.DS_Store
    Untracked:  analysis/02_degs_go_aneuploidy_median.Rmd
    Untracked:  analysis/03_degs_go_CD82expr_median.Rmd
    Untracked:  analysis/VennDiagram.2025-06-09_13-53-40.335615.log
    Untracked:  analysis/VennDiagram.2025-06-09_13-54-51.029086.log
    Untracked:  analysis/VennDiagram.2025-06-09_13-55-15.147126.log
    Untracked:  analysis/VennDiagram.2025-06-09_13-56-18.122749.log
    Untracked:  analysis/VennDiagram.2025-06-09_13-56-30.934079.log
    Untracked:  analysis/VennDiagram.2025-06-09_14-18-19.412377.log
    Untracked:  analysis/VennDiagram.2025-06-18_10-28-53.699452.log
    Untracked:  analysis/VennDiagram.2025-06-18_10-37-36.77178.log
    Untracked:  analysis/VennDiagram.2025-06-18_11-32-36.228427.log
    Untracked:  analysis/VennDiagram.2025-06-18_15-38-55.387683.log
    Untracked:  analysis/VennDiagram.2025-06-18_15-48-17.579371.log
    Untracked:  analysis/VennDiagram.2025-06-18_17-18-17.268774.log
    Untracked:  analysis/VennDiagram.2025-06-19_11-11-17.376961.log
    Untracked:  analysis/VennDiagram.2025-06-19_14-52-46.049026.log
    Untracked:  analysis/VennDiagram.2025-06-19_16-40-05.861139.log
    Untracked:  analysis/VennDiagram.2025-06-19_16-40-07.33202.log
    Untracked:  analysis/VennDiagram.2025-06-19_16-40-08.673023.log
    Untracked:  analysis/VennDiagram.2025-06-19_17-50-05.238063.log
    Untracked:  analysis/VennDiagram.2025-06-19_17-50-07.22979.log
    Untracked:  analysis/VennDiagram.2025-06-19_17-50-09.007028.log
    Untracked:  analysis/VennDiagram.2025-06-19_18-48-01.885712.log
    Untracked:  analysis/VennDiagram.2025-06-19_18-48-03.579702.log
    Untracked:  analysis/VennDiagram.2025-06-19_18-48-04.898695.log
    Untracked:  analysis/VennDiagram.2025-06-20_10-18-23.300456.log
    Untracked:  analysis/VennDiagram.2025-06-20_10-18-24.588109.log
    Untracked:  analysis/VennDiagram.2025-06-20_10-18-26.077856.log
    Untracked:  analysis/VennDiagram.2025-06-20_10-50-54.081682.log
    Untracked:  analysis/VennDiagram.2025-06-20_10-50-55.516535.log
    Untracked:  analysis/VennDiagram.2025-06-20_10-50-56.913582.log
    Untracked:  analysis/VennDiagram.2025-06-20_11-10-43.68944.log
    Untracked:  analysis/VennDiagram.2025-06-20_11-10-45.681514.log
    Untracked:  analysis/VennDiagram.2025-06-20_11-10-47.126222.log
    Untracked:  analysis/hsa04064.HLT-HighAS_vs_HLT-LowAS.png
    Untracked:  analysis/hsa04064.HLT-HighCD82_vs_HLT-LowCD82.png
    Untracked:  analysis/hsa04064.HRplus-HighAS_vs_HRplus-LowAS.png
    Untracked:  analysis/hsa04064.HRplus-HighCD82_vs_HRplus-LowCD82.png
    Untracked:  analysis/hsa04064.HRplus_vs_HLT.png
    Untracked:  analysis/hsa04064.TNBC-HighAS_vs_TNBC-LowAS.png
    Untracked:  analysis/hsa04064.TNBC-HighCD82_vs_TNBC-LowCD82.png
    Untracked:  analysis/hsa04064.TNBC_vs_HLT.png
    Untracked:  analysis/hsa04064.TNBC_vs_HRplus.png
    Untracked:  analysis/hsa04064.png
    Untracked:  analysis/hsa04064.xml
    Untracked:  code/
    Untracked:  data/
    Untracked:  degs_HLT-HighAS_vs_HLT-LowAS.csv
    Untracked:  degs_HLT-HighCD82_vs_HLT-LowCD82.csv
    Untracked:  degs_HRplus-HighAS_vs_HRplus-LowAS.csv
    Untracked:  degs_TNBC-HighCD82_vs_TNBC-LowCD82.csv
    Untracked:  output/

Unstaged changes:
    Modified:   analysis/00_casper_analysis.Rmd
    Deleted:    analysis/02_deconvolution.Rmd
    Modified:   analysis/index.Rmd
    Modified:   casper_ss_ma.Rproj

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/01_degs_go.Rmd) and HTML (docs/01_degs_go.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd f0e862c annamariapiva 2025-06-20 new reports

Introduction

The goal of this analysis is to identify which pathways are up- or down-regulated in each condition (Healthy, HR+, and TNBC). The following comparisons have been performed:

  • HR+ vs Healthy

  • TNBC vs Healthy

  • TNBC vs HR+

Overview of the analysis step

The analysis includes:

    1. Differential gene expression analysis with DESeq2
    1. Gene set enrichment analysis with ClusterProfiler
    1. Gene set enrichment analysis with fastGSEA with the curated Human MSigDB Collections. In particular the hallmark gene sets summarize and represent specific well-defined biological states or processes.
    1. Focus on NF-kB pathway. The R package pathview allows to visualize differentially expressed genes in the KEGG pathway NF-kB pathway

The input for the following analysis is:

  • counts matrix, produced by Salmon and normalized with variance stabilizing transformation (VST) normalization using DESeq2, where each row represents one sample and each column represents one gene, so each cell represents the expression level of a specific gene in a particular sample. VST aims at generating a matrix of values for which variance is constant across the range of mean values, especially for low mean;
  • samples info, including sample name, condition (HRplus, TNBC, Healthy) and batch (240919_rnaseq, 250501_rnaseq).
knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE)

Loading R packages and input data

The first steps to start the analysis in R is to load the packages required for the analysis, load the input data mentioned above and establish the thresholds for the analysis:

  • min_sample = 2, minimum number of samples where the gene needs to have at least 1 read;
  • logfc = log2(2) = 1, which represents the ratio between the expression level of a gene in the conditions considered, expressed in logarithmic scale (base 2); a positive log fold change for a gene, greater than 1, means that the expression of that specific gene is increased in group1 with respect to group2, by a multiplicative factor 2^logfc;
  • qvalue = 0.01, that can be interpreted as false positive rate, the proportion of false positives among all positive results, which means avoid to detect differential expression of a gene that is not differentially expressed. LogFC and qvalue thresholds have been selected based on commonly used thresholds.

Quality control

PCA

Let’s have a look at PCA, and gene expression pattern across samples. The batch effect has been considered in the design.

Correlation and distance between samples

To evaluate the similarity between RNA-seq samples, we computed both Pearson correlation and Euclidean distance using variance-stabilized expression data:

  • Pearson correlation (cor()): Measures the degree to which gene expression profiles across samples vary in a similar pattern. Values range from –1 (inverse relationship) to +1 (perfect similarity). High correlations indicate that samples share similar expression trends.

  • Euclidean distance (dist()): Quantifies the overall dissimilarity in expression profiles between samples, based on their absolute expression values. Smaller distances indicate more similar samples.

These metrics provide complementary views: correlation focuses on shared expression patterns (direction), while distance captures overall differences in magnitude. Both are useful for identifying sample clusters, detecting outliers, and validating experimental reproducibility.

Differential expression analysis

Differential expression analysis is performed using a custom function, which accounts for batch effect. A batch effect occurs when non-biological factors, like laboratory conditions or instruments used, in an experiment cause changes in the data produced by the experiment. Lowly expressed genes are removed to reduce noise. Lowly expressed genes are here considered as:

  • genes having total number of reads less than half of the samples;
  • genes expressed in less samples than the number of conditions.

Contrast 1: HRplus vs HLT

MA plot and volcano plot

Genes are annotated as significant or not, to distinguish between genes showing meaningful changes, that is having an adjusted p-value below the threshold considered above and an absolute log2FoldChange greater than the cutoff considered above.

Table of all differentially expressed genes

Heatmap for top 20 genes

Given the significant genes, among the differentially expressed genes previously computed, let’s visualize the top20 and all the DE genes.

Meaning of Colors

  • Red: Indicates high expression for that gene in a given sample (value above average, positive compared to the standardized scale).
  • Blue: Indicates low expression for that gene in a given sample (value below average, negative compared to the standardized scale).
  • White (or intermediate color): Indicates an expression close to the average (standardized value around 0).

Gene set enrichment analysis

Further analysis is done through gene set enrichment analysis, which does not exclude genes based on logfc or adjusted p-value, as done previously. GSEA is performed separately on each subontology: biological processes (BP), cellular components (CC) and molecular functions (MF). The dot plot below shows the top 10 most enriched GO terms. The size of each dot correlates with the count of differentially expressed genes associated with each GO term. Furthermore, the color of each dot reflects the significance of the enrichment of the respective GO term, highlighting its relative importance.

GSEA msigdbr

quartz_off_screen 
                2 

Pathway viewer: focus on NF-kB signaling pathway (KEGG id: hsa04064)

To visualize gene expression changes on biological pathways, we used the pathview R package, which maps gene-level statistics (e.g., log2 fold-changes) onto KEGG pathway diagrams.

For each contrast in our differential expression analysis, we extracted significantly differentially expressed genes and passed their log2 fold-change values to pathview() to visualize the NF-kappa B signaling pathway (KEGG pathway ID “hsa04064”). Pathway visualizations highlight upregulated and downregulated genes in red and blue, respectively, based on log2 fold-change.

Contrast 2: TNBC_vs_HLT

MA plot and volcano plot

Genes are annotated as significant or not, to distinguish between genes showing meaningful changes, that is having an adjusted p-value below the threshold considered above and an absolute log2FoldChange greater than the cutoff considered above.

Heatmap for top 20 genes

Given the significant genes, among the differentially expressed genes previously computed, let’s visualize the top20 and all the DE genes.

Meaning of Colors

  • Red: Indicates high expression for that gene in a given sample (value above average, positive compared to the standardized scale).
  • Blue: Indicates low expression for that gene in a given sample (value below average, negative compared to the standardized scale).
  • White (or intermediate color): Indicates an expression close to the average (standardized value around 0).

Gene set enrichment analysis

GSEA msigdb

quartz_off_screen 
                2 

Contrast 3: TNBC vs HRplus

MA plot and volcano plot

Genes are annotated as significant or not, to distinguish between genes showing meaningful changes, that is having an adjusted p-value below the threshold considered above and an absolute log2FoldChange greater than the cutoff considered above.

Heatmap for top 20 genes

Given the significant genes, among the differentially expressed genes previously computed, let’s visualize the top20 and all the DE genes.

Meaning of Colors

  • Red: Indicates high expression for that gene in a given sample (value above average, positive compared to the standardized scale).
  • Blue: Indicates low expression for that gene in a given sample (value below average, negative compared to the standardized scale).
  • White (or intermediate color): Indicates an expression close to the average (standardized value around 0).

Gene set enrichment analysis

GSEA msigdb

quartz_off_screen 
                2 

Common pathways

Biological Processes Pathways

Cellular Components Pathways

Molecular Functions Pathways

GSEA Heatmap Hallmarks - all comparisons

Table of all genes


R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.4.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Rome
tzcode source: internal

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] VennDiagram_1.7.3           futile.logger_1.4.3        
 [3] tibble_3.3.0                fgsea_1.26.0               
 [5] msigdbr_24.1.0              gridExtra_2.3              
 [7] dplyr_1.1.4                 clusterProfiler_4.8.2      
 [9] plotly_4.10.4               reshape_0.8.9              
[11] ggplot2_3.5.2               gplots_3.2.0               
[13] RColorBrewer_1.1-3          ComplexHeatmap_2.16.0      
[15] rtracklayer_1.60.1          DESeq2_1.40.2              
[17] SummarizedExperiment_1.30.2 Biobase_2.60.0             
[19] MatrixGenerics_1.12.3       matrixStats_1.5.0          
[21] GenomicRanges_1.52.1        GenomeInfoDb_1.36.4        
[23] IRanges_2.34.1              S4Vectors_0.38.2           
[25] BiocGenerics_0.46.0         DT_0.33                    

loaded via a namespace (and not attached):
  [1] rstudioapi_0.17.1        jsonlite_2.0.0           shape_1.4.6.1           
  [4] magrittr_2.0.3           farver_2.1.2             rmarkdown_2.29          
  [7] GlobalOptions_0.1.2      fs_1.6.6                 BiocIO_1.10.0           
 [10] zlibbioc_1.46.0          vctrs_0.6.5              memoise_2.0.1           
 [13] Rsamtools_2.16.0         RCurl_1.98-1.17          ggtree_3.8.2            
 [16] htmltools_0.5.8.1        S4Arrays_1.0.6           lambda.r_1.2.4          
 [19] curl_6.3.0               gridGraphics_0.5-1       sass_0.4.10             
 [22] KernSmooth_2.23-26       bslib_0.9.0              htmlwidgets_1.6.4       
 [25] plyr_1.8.9               futile.options_1.0.1     cachem_1.1.0            
 [28] GenomicAlignments_1.36.0 whisker_0.4.1            igraph_2.1.4            
 [31] lifecycle_1.0.4          iterators_1.0.14         pkgconfig_2.0.3         
 [34] gson_0.1.0               Matrix_1.6-4             R6_2.6.1                
 [37] fastmap_1.2.0            GenomeInfoDbData_1.2.10  clue_0.3-66             
 [40] aplot_0.2.5              digest_0.6.37            enrichplot_1.20.0       
 [43] colorspace_2.1-1         patchwork_1.3.0          AnnotationDbi_1.62.2    
 [46] rprojroot_2.0.4          crosstalk_1.2.1          RSQLite_2.4.1           
 [49] org.Hs.eg.db_3.17.0      labeling_0.4.3           polyclip_1.10-7         
 [52] httr_1.4.7               abind_1.4-8              compiler_4.3.1          
 [55] bit64_4.6.0-1            withr_3.0.2              doParallel_1.0.17       
 [58] downloader_0.4.1         BiocParallel_1.34.2      viridis_0.6.5           
 [61] DBI_1.2.3                ggforce_0.4.2            MASS_7.3-60             
 [64] DelayedArray_0.26.7      rjson_0.2.23             HDO.db_0.99.1           
 [67] gtools_3.9.5             caTools_1.18.3           tools_4.3.1             
 [70] scatterpie_0.2.4         ape_5.8-1                httpuv_1.6.16           
 [73] glue_1.8.0               restfulr_0.0.15          nlme_3.1-168            
 [76] GOSemSim_2.26.1          promises_1.3.3           shadowtext_0.1.4        
 [79] cluster_2.1.8.1          reshape2_1.4.4           generics_0.1.4          
 [82] gtable_0.3.6             tidyr_1.3.1              data.table_1.17.6       
 [85] tidygraph_1.3.1          XVector_0.40.0           ggrepel_0.9.6           
 [88] foreach_1.5.2            pillar_1.10.2            stringr_1.5.1           
 [91] babelgene_22.9           yulab.utils_0.2.0        later_1.4.2             
 [94] circlize_0.4.16          splines_4.3.1            tweenr_2.0.3            
 [97] treeio_1.24.3            lattice_0.22-7           bit_4.6.0               
[100] tidyselect_1.2.1         GO.db_3.17.0             locfit_1.5-9.12         
[103] Biostrings_2.68.1        knitr_1.50               git2r_0.36.2            
[106] xfun_0.52                graphlayouts_1.2.2       stringi_1.8.7           
[109] ggfun_0.1.8              workflowr_1.7.1          lazyeval_0.2.2          
[112] yaml_2.3.10              evaluate_1.0.4           codetools_0.2-20        
[115] ggraph_2.2.1             qvalue_2.32.0            ggplotify_0.1.2         
[118] cli_3.6.5                jquerylib_0.1.4          Rcpp_1.0.14             
[121] png_0.1-8                XML_3.99-0.18            parallel_4.3.1          
[124] assertthat_0.2.1         blob_1.2.4               DOSE_3.26.2             
[127] bitops_1.0-9             tidytree_0.4.6           viridisLite_0.4.2       
[130] scales_1.4.0             purrr_1.0.4              crayon_1.5.3            
[133] GetoptLong_1.0.5         rlang_1.1.6              formatR_1.14            
[136] cowplot_1.1.3            fastmatch_1.1-6          KEGGREST_1.40.1